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Abstract 


This  report  summarizes  the  research  to  date  performed  by  the  author 
and  his  students  on  the  use  of  modern  multiparameter  estimation  tech- 
niques in  the  building  of  an  attrition  rate  generator  in  support  of  the  USMC 
Officer  Planning  and  Utility  System  (OPUS).  Three  main  areas  are  identi- 
fied: The  cell  aggregation  problem;  the  specifics  of  parameter  estimation; 
the  need  to  match  forecasting  techniques  to  the  specific  application.  Most 
of  the  effort  has  been  in  the  first  two  of  these  areas  and  much  has  been 
learned.  The  aggregation  problem,  i.e.,  the  grouping  of  personnel  cells  into 
an  appropriate  number  having  common  size  and  attrition  behavior,  has 
emerged  as  the  most  important  problem  that  requires  immediate  atten- 
tion. Its  resolution  is  expected  to  lead  to  a  clear  policy  for  multiparameter 
estimation.  Estimation  and  forecasting  are  both  impacted  by  the  nature  of 
the  data  base.  It  is  likely  that  specific  applications  will  use  differing  data 
bases  and  differing  statistical  techniques  as  well. 


EXECUTIVE  SUMMARY 

The  author  and  his  students  have  been  working  with  a  number  of  modern 
techniques  applied  to  the  problem  of  estimating  attrition  (leave  the  service)  rates 
for  the  numerous  cells  that  appear  in  manpower  planning  models  for  the  USMC 
officer  corps.  Special  attenion  is  given  to  the  "small  cell"  problem;  that  is,  officer 
categories  that  normally  contain  but  a  few  personnel.  These  cells  are  numer- 
ous and  historical  empirical  rates  for  them  are  generally  quite  unstable.  This 
report  will  summarize  what  we  have  learned  to  date,  and  outline  the  research 
continuation  plans. 

Work  that  has  applied  shrinkage  type  estimators  to  the  problem  of  estimating 
officer  attrition  rates  has  been  reported  in  [22,50,57].  The  methods  tested  have 
been  successful  in  the  comparative  sense.  That  is,  they  perform  better  than  the 
raw  historical  rates  that  might  be  used  in  an  ad  hoc  fashion.  But  their  behavior 
in  the  absolute  sense  still  has  erratic  aspects.  Moveover  we  do  not  have  a  solid 
way  to  anticipate  the  areas  of  unstable  performance. 

The  recent  acquisition  of  a  much  more  refined  date  tape  and  the  theses  by 
Larsen  and  Dickenson  [38,22]  have  lead  to  greater  insight  to  this  problem.  In 
particular  the  new  data  covers  ten  years,  breaks  out  officer  grade  by  above  zone 
and  in  or  below  zone,  regular  and  reserve,  unrestricted  and  limited  duty,  etc. 
The  thesis  by  Larsen  identifies  the  important  break  points  in  the  YCS  (years 
of  commissioned  service)  scale  and  some  MOS  (military  occupation  specialty) 
categories  that  must  be  treated  separately.  The  thesis  by  Dickinson,  in  addition 
to  pursuing  some  isolated  details  that  had  been  treated  presumptuously  in  earlier 
work,  introduces  an  empirical  Bayes  method  that  appears  to  be  doing  a  better 
job  of  shrinking  the  raw  estimates.  It  seems  to  manage  better  the  unevenness  of 
the  cell  inventories.  Finally,  some  of  our  problems  have  also  been  experienced 
by  Carter  and  Rolph  [13]  so  we  propose  to  pursue  their  suggestions  as  well. 

Our  studies  have  led  us  to  believe  that  the  most  important  item  in  the 
continuation  work  is  the  aggregation  problem.  This  problem  has  two  aspects: 


(i.)   The  grouping  of  cells  into  communities  of  homogeneous  attrition 
behavior. 

(ii.)  The  combining  or  amalgamation  of  cells  in  order  to  meet  min- 
imal cell  inventory  requirements. 

This  need  involves  some  exploration  of  the  data.  Because  of  the  cumbersomeness 
of  the  data  extraction  problem,  it  will  be  necessary  to  make  wise  choices  and 
study  the  most  germane  collections  of  cells. 

Based  upon  the  results  of  Carter  and  Rolf,  we  anticipate  that  an  adequate 
solution  to  the  aggregation  problem  will  lead  to  a  clear  policy  for  attrition  rates 
generation.  Once  this  is  accomplished,  we  can  turn  to  the  specific  needs  and 
idiosyncrasies  of  the  various  application  models.  This  will  include  questions  of 
both  short  term  and  long  term  forecasting. 

1      INTRODUCTION 

In  recent  years,  the  Marine  Corps  has  been  phasing  its  manpower  management 
into  a  centrally  organized  and  computerized  Officer  Planning  and  Utility  System 
(OPUS)  [15,  16,17,18,19,20].  This  system  contains  a  number  of  planning  models 
and  such  models  are  affected  by  three  general  factors:  existing  inventory  (per- 
sonnel), projected  losses,  and  projected  gains.  In  order  to  project  the  inventory 
into  various  future  time  periods,  it  is  necessary  to  use  a  realistic  system  of  flow 
rates.  Some  of  the  rates  are  under  administrative  control,  such  as  promotions, 
job  assignments,  and  of  course  everyone  acquires  longevity  with  the  passage  of 
time.  The  attrition  flowrates,  however,  can  be  anticipated  only  in  a  statistical 
sense.  By  attrition  we  mean  leaving  the  service  for  any  reason  (e.g.  resigna- 
tion, discharge,  disability,  release,  retirement)  and  the  circumstances  that  lead 
to  these  attritions  are  not  under  the  control  of  the  planner.  (Note:  Some  attri- 
tions are  voluntary  and  some  involuntary.  For  general  purposes  we  assume  the 
planner  is  not  cognizant  of  the  involuntary  losses.) 


Our  role  in  support  of  OPUS  is  to  develop  useful  attrition  rates  so  that 
losses  of  this  type  can  be  reasonably  estimated.  Obviously,  the  replacement  lead 
time  for  planning  is  seldom  small;  most  replacements  ascend  into  the  service 
as  young  lieutenants;  augmentation  from  the  reserves  is  also  used.  The  cost  of 
poor  planning  is  great.  Too  many  planned  replacements  lead  to  under  utilized 
personnel;  too  few  lead  to  jobs  unfilled  and  the  inability  to  function  as  required. 

The  purpose  of  this  report  is  to  gather  and  summarize  what  we  have  learned 
about  attrition  rate  generation  as  it  pertains  to  the  USMC  officer  corps;  to 
describe  the  work  in  progress;  to  outline  ways  to  study  forecasting  methods  that 
can  serve  the  individual  needs  of  the  various  models.  Thus  sponsors  and  others 
are  given  current  appraisal.  This  report  also  serves  as  a  working  document  for 
students  and  other  researchers.  The  terminology  and  notation  are  standardized. 

The  report  is  organized  as  follows:  Following  this  introduction  we  lay  the 
base  in  terms  of  details  of  the  problem  description,  notation,  conventions,  data 
structure  and  estimation  methods.  This  section  will  also  include  a  number 
of  satellite  issues  including  a  discussion  of  the  measures  of  effectiveness  and 
the  validation  techniques.  Section  3  contains  summaries  of  the  seven  theses 
[1,22,34,38,50,57,58]  that  have  been  written  in  support  of  this  project  and  dis- 
cusses how  they  integrate  towards  the  common  goal.  Section  4  is  devoted  to 
a  brief  discussion  of  futuristics.  It  appears  important  that  the  researchers  fa- 
miliarize themselves  with  the  needs  of  specific  user  manpower  models.  Data 
structures  and  forecasting  methods  should  be  tailored  for  them. 

2     PROBLEM  DESCRIPTION,  ISSUES,  DETAILS. 

A.  General  Structure  and  Notation 

For  the  macro  view  it  is  convenient  to  think  of  the  officer  "cells"  as  the 
result  of  cross  classifying  according  to  grade  (GR),  military  occupation  specialty 
(MOS)  and  length  of  service  (LOS).  It  will  be  seen  later  that  further  refinement 


is  useful  and  sharpens  the  results.  (This  will  be  discussed  under  Data  and 
Conventions.)  Some  of  these  cells  are  large  (i.e.  have  large  personnel  inventory), 
e.g.  the  grades  of  first  lieutenant  or  captain  with  3-7  years  of  service  and  in  the 
combat  arms  occupations.  Those  familiar  with  the  Corps  realize  that  there 
are  many,  many  small  cells.  The  GR  factor  has  a  pyramid  structure;  fewer 
officers  in  the  higher  grades.  Of  course,  GR  is  well  correlated  with  LOS,  but 
not  sufficently  so  that  one  of  them  can  be  removed  from  consideration.  (Also 
LOS  is  closely  correlated  with  YCS,  years  of  commissioned  service,  and  there 
are  instances  for  which  this  distinction  is  important.)  Under  MOS  we  have 
considerable  variability  in  that  many  officers  are  designated  as  qualified  under 
several  job  codes.  Some  of  the  codes  are  robust  in  that  there  is  a  reasonable 
level  of  transferability;  i.e.  with  a  modest  amount  of  training,  an  officer  can 
transfer  from  one  job  to  another.  Other  codes  have  high  training  costs  or  high 
levels  of  specialization;  e.g.  the  aviation  communities,  and  attorneys.  Such 
considerations  are  very  important  to  the  manpower  planner.  They  also  impact 
upon  the  way  that  we  build  an  attrition  rate  generator  because  the  stabilization 
of  rates  for  small  cells  will  depend  upon  our  ability  to  gather  together  small  cells 
that  have  a  communality  of  characteristics. 

The  time  flow  of  personnel  through  the  system  involves  gaining  a  year  on 
the  LOS  scale  each  year,  periodic  advancements  (or  not)  in  GR,  and  changes 
in  MOS  (responsibilities  increase  with  experience).  The  USMC  normally  has 
between  18,000  and  20,000  officers.  Although  there  are  dependencies  in  the 
cell  flows  we  are  not  prepared  to  include  them  in  the  modeling  process  of  the 
attrition  aspects  of  such  a  large  system.  Instead,  a  binomial  distribution  model 
is  adopted.  Further  we  presume  cell  to  cell  independence.  The  impact  of  the 
independence  assumption  will  be  softened  by  the  way  that  we  aggregate  cells, 
and  by  the  estimation  technique. 

Although  the  cells  are  most  numerous  and  their  specifications  are  the  result 
of  cross  classification,  for  purposes  of  study  and  development  we  assume  that 


homogeneous  subsets  of  cells  have  been  identified  and,  within  each,  they  are 
placed  in  a  lineal  set.  The  letter  A:  is  used  generally  to  represent  the  number  of 
cells  in  a  set;  the  letter  T  represents  the  number  of  years  data  to  be  used  in  the 
estimation  process.  Thus  for  i  =  1,. . .  7k  and  t  =  1, ...  ,T,  let 

Nt(t)     =     inventory  of  cell  t  in  year  t;  (1) 

Y{(t)     —     number  of  attritions  in  cell  :  in  year  t  .  (2) 

Basically  the  raw  empirical  attrition  rate  for  cell  :  is  the  maximum  likelihood 
estimator  (MLE) 

ft  =fe*w)/fe  *<('))  (3) 

This  works  well  for  large  cells,  but  not  for  small  ones.  (E.g.  The  information  in 
0/5  is  considerably  different  from  that  in  0/500,  yet  the  MLE  is  the  same.)  The 
overall  attrition  rate  for  USMC  officers  averages  about  10%  in  recent  years.  Thus 
our  statistical  "small  cell"  problem  is  compounded  by  a  "low  rate"  problem. 

The  overall  strategy  for  addressing  our  problem  has  two  main  aspects.  They 
will  be  called  the  aggregation  problem  and  the  shrinkage  method  problem.  There 
are  a  rather  large  variety  of  ways  to  manage  each  and  it  appears  that  they  cannot 
be  treated  in  isolation,  but  must  be  managed  together. 

The  aggregation  problem  was  stated  earlier  and  we  repeat  it  now.  We  have 
spoken  of  collections  of  homogeneous  subsets  of  cells  that  possess  a  communality 
of  behavior  with  respect  to  attrition.  For  our  purposes  we  must  emphasize  two 
facets  to  this  problem: 

1.  The  identification  of  adequate  numbers  of  cells  whose  inventory  personnel 
are  likely  to  have  common  attrition  behavior. 

2.  The  grouping  together  or  amalgamation  of  the  small  cells  in  the  aggre- 
gate into  super  cells  whose  inventory  values  meet  minimal  requirements, 
specified  by  the  user. 
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Previous  rate  generators  have  been  concerned  only  with  2,  because  1  has  no 
role  if  one  uses  historical  rates.  The  advantages  of  shrinkage  methods  comes 
from  the  use  of  information  contained  in  similar  cells.  The  key  is  to  identify  the 
similar  cells.  Hence,  item  1  is  included  in  our  aggregation  problem. 

The  shrinkage  method  problem  is  involved  with  procedures  which,  for  a  given 
aggregate,  must  choose  a  single  central  rate  for  that  aggregate  and  shrinkage 
factors  which  shrinks  the  cell  MLE,  p,,  towards  the  central  value  by  an  amount 
equal  to  that  cell's  shrinkage  factor. 

In  the  work  to  date  the  shrinkage  problem  has  received  the  greater  atten- 
tion [22,50,57].  In  the  last  decade  or  two  the  statistical  literature  has  displayed 
many  papers  on  shrinkage  methods  for  multiparameter  estimation  problems. 
The  results,  in  the  light  of  applications,  have  been  startling  and  glamourous, 
see  e.g. [13,27,29].  Naturally,  it  has  been  more  exciting  (and  easier)  to  try  these 
methods  on  our  attrition  rate  problem  using  ad  hoc,  but  defensible,  cell  aggre- 
gations. 

On  the  other  hand  the  aggregation  problem  has  not  been  totally  ignored. 
But  it  has  proved  to  be  more  difficult,  largely  because  of  the  cumbersome  data 
handling  problems.  The  theses  by  Elseramegy  and  Larsen  [1,38]  have  dealt  with 
this  problem.  An  important  observation  by  Carter  and  Rolf  [13;  882-3]  is  that  the 
inventory  numbers  for  the  cells  in  an  aggregation  should  not  be  highly  variable. 
This  principle  was  not  used  in  choosing  the  ad  hoc  aggregates  mentioned  in  the 
preceding  paragraph. 

Returning  to  the  question  of  shrinkage  methods,  there  is  an  important  gen- 
eral point  that  should  be  made  at  this  time.  Most  of  the  methodological  de- 
velopment has  used  the  mathematical  setting  of  independent  normal  random 
variables  with  common  variance,  see  e.g. [22,23,24,  26,27,36].  Moreover,  several 
applications  [13,27]  have  been  successful  using  binomial  data  which  has  been 
transformed  to  behave  more  like  normal  data.  Thus  our  approach  to  shrinkage 
estimation  has  followed  this  lead  and  has  three  steps: 


1.  Transform  the  raw  cell  data  via  the  Freeman  Tukey  transformation  [27,30]. 

2.  Apply  the  shrinkage  method  to  the  transformed  data. 

3.  Invert  the  results  to  the  original  scale. 

The  Freeman-Tukey  transform  is  an  enhancement  of  the  basic  arc  sin  trans- 
form for  binomial  data  which  is  designed  to  give  more  stability  to  the  variance 
and  make  the  distribution  closer  to  normal.  The  form  that  we  have  been  using 
is  (dropping  the  cell  and  time  affixes) 

X  =  l^TE  {arc  sin  (2^  -  l)  +  arc  sin  (2£±I  -  l)  }  ,        (4) 

where  N  is  the  cell  inventory  and  Y  is  its  leaver  count.  This  form  appears  to  be 
different  from  the  more  customary 


Y  .       Y  +  l 


VN  +  .5  <  arc  sinW  — -I-  arc  sinW  — >  .  (5) 

l/  N  +  1  V  N  +  1  I  v  ' 

Both  have  variance  approximately  equal  to  one.  Because  of  the  identity 

sin"1(2p  -  1)  =  2sin-1(v/p~)  -  tt/2,  (6) 

they  are  effectively  the  same,  differing  only  by  the  term  (\/N  +  .5]  tt/2.  The 
former  was  chosen  for  use  because  it  circumvents  the  computation  of  a  large 
number  of  square  roots.  The  shrinkage  process  is  applied  to  the  data  X  of 
eq.(4)  after  averaging  over  time  and  developing  a  collection  of  these  values  for 
all  the  cells  in  an  aggregate.  This  is  described  in  detail  in  the  subsection  C. 
There  are  also  questions  of  detail  concerning  how  to  invert  the  result.  These  too 
will  be  deferred.  For  now,  it  suffices  to  recognize  that  the  transform  (4)  is  an 
average  value  for 


VN  +  .5    arc  sin(2p  -  1)  (7) 

and  if  X*  is  the  shrunken  value  for  X  then  the  shrunken  value  for  p  will  look 
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like 


=  0 


2 
=  1 


if  X*  <  {-n/2)y/N  +  .5 
-{l+ sin  (**/%/# +.5)},     otherwise  (8) 


if  X*  >  (x/2)\/N+  .5 
A  number  of  notational  conventions  have  been  adopted  during  the  course  of 
the  project.    They  are  used  both  separately  and  in  concert.    We  conclude  this 
subsection  with  a  listing  of  them 


TS  transformed  scale  MO 

OS  original  scale  MU 

ML  maximum  likelihood  MAD 

JS  James  -  Stein  SSB 

LT  limited  translation  SSE 

EB  empirical  Bayes  GR 

MOS  military  occupation  speciality     OF 

LOS  length  of  service  YCS 

LDO  limited  duty  officer  UNR 

MOE  measure  of  effectiveness  FOM 


mean  overage 

mean  underage 

mean  absolute  deviation 

sum  of  squares  between  groups 

sum  of  squared  errors 

grade 

occupation  field 

years  of  commissioned  service 

unrestricted 

figure  of  merit 


B.  Data.   Conventions 


The  orginal  data  tape  supplied  by  NPRDC  contained  data  for  seven  years, 
1977  thru  1983.  It  was  possible  to  identify  10  grades  (warrant  officer  0-3,  second 
lieutenant  -  colonel),  31  LOS  levels  (0  -  30  years  with  the  final  one  being  30  or 
more),  40  MOS  levels  (actually  OF,  the  first  two  digits  of  the  four  digit  MOS), 
and  8  loss  types.  Details  appear  in  [57].  This  was  the  data  base  used  in  the 
theses  [1,34,50,57,58]. 

A  more  extensive  and  refined  data  tape  was  received  in  the  summer  of  1987. 
It  contained  10  years  of  data,  1977  thru  1986.  For  a  complete  description  of  the 
refinements  see  [38].  For  our  immediate  purposes  it  suffices  to  point  out  that 
LOS  is  replaced  by  YCS  (31  cells);  GR  is  further  broken  out  by  UNR/LDO  and 


the  failed  select  (to  promote)  are  separated  from  the  others;  full  MOS  codes  are 
available;  commissioning  source  (15  levels);  eduation  (4  levels);  regulars  can  be 
separated  from  reserves. 

It  is  important  to  draw  attention  to  the  distinction  between  central  and 
transition  data.  See  [3;  p24].  This  impacts  upon  the  way  that  the  data  are  used. 
For  the  earlier  tape  mentioned  above,  the  cell  inventories  refer  to  specific  dates 
(or  "snapshot"  data).  It  is  the  number  of  occupants  of  the  cell  at  the  beginning 
of  the  fiscal  year.  On  the  other  hand,  the  attrition  counts  for  a  cell  contain  the 
number  of  leavers  at  any  time  during  the  year.  If  an  officer  changes  cells  during 
the  year  and  then  leaves,  the  attrition  is  credited  to  the  cell  occupied  at  the 
time  of  leaving,  not  the  cell  that  credits  him  for  inventory.  As  an  extreme  case 
of  this  situation  it  is  possible  for  a  cell  to  contain  zero  inventory  and  yet  record 
several  leavers. 

For  this  reason  the  following  convention  was  adopted.  First  the  cell  inventory 
is  replaced  by  the  average  of  the  beginning  and  end  of  year  inventories.  (Note: 
this  is  possible  for  all  years  save  the  last,  which  must  use  only  the  initial  figure). 
Second,  the  central  inventory  is  defined  to  be  the  larger  of  the  average  inventory 
and  the  number  of  leavers.  In  this  way  we  are  assured  that  Y  <  N  and  these 
are  the  Nt(t)  values  used  in  all  formulas. 

For  the  refined  data  tape,  a  different  situation  exists.  The  inventory  figures 
are  recorded  in  units  of  man-quarters.  In  this  case,  for  our  yearly  analysis,  we 
use  the  man-quarter  figure  divided  by  four  in  all  formulas. 

C.  Concepts  of  Shrinkage  Estimation;  Heuristics 

Perhaps  the  most  familiar  setting  for  describing  this  idea  is  that  of  one  way 
analysis  of  variance  (ANOVA).  Consider  independent  random  variables  {A,-y} 
and  the  distributional  model 

Xij~  N{m,a2)  !  =  •,...,*;     j  =  l,...,n.  (9) 
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The  goal  is  to  estimate  pu . . . ,  ,nk.     Stein   [53]  has  shown  that  the  obvious 
estimators 

—        1    n 
k  =  Xi  =  -J2Xij  i  =  l,...,*  (10) 

are  inadmissible  using  the  average  squared  error  loss  function 

MM)  =  ££[*- *]2  (11) 

K    i 

where  the  <5,    =   <5,(z)  are  the  estimating  statistics.     That  is,   he  constructed 
functions  <5,(x)  ^  X{  that  have  smaller  values  of  L;  i.e. 

Ic  J. 

2 


D* -*■]' <£[*.■- w 


and  the  dominating  functions,  {5,-}  are  convex  combinations  of  the  {X{}  and  the 
grand  mean  X  =  j^-AT,-.  That 


6,-     =  (1  -  sh)Xi +shX 

(12) 
=  X+{l-sh)[Xi-X). 

where  sh  is  the  (yet  to  be  specified)  shrinkage  factor.    Equation  (12)  provides 

the  structure  for  all  estimators  that  utilized  fixed  shrinkage  toward  the  grand 

mean. 

Heuristically,  we  would  want  the  shrinkage  factor  to  be  larger  (close  to  unity) 
when  the  {/*, }  are  nearly  all  the  same;  i.e.  the  departures  of  the  St  from  the  grand 
mean  should  be  small.  By  way  of  contrast,  if  the  /j,  are  highly  variable  then  the 
{6{}  should  not  shrink  very  far  from  the  group  means,  {Xi}.  The  traditional 
analysis  of  variance  technique  provides  a  way  of  measuring  the  relative  variability 
of  the  fii  and,  from  this,  a  value  for  the  shrinkage  parameter  can  be  produced. 

The  ANOVA  table  customarily  produces  the  two  sums  of  squares 

k       _  _    2 

SS B  =  nJ2  (Xi  -  *) 

(13) 

SS£  =  XX(X,,-X,)2. 
t=i>=i 
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The  former,  sum  of  squares  between  groups,  is  proportional  to  the  sampling 
variance  of  the  {Xt}  and  the  latter,  sum  of  squared  errors,  is  proportional  to  the 
estimator  for  a  .  Thus  shrinkage  should  vary  inversely  with  the  ratio  SS B/SS E. 
The  recommended  scaling  is 

•     (      (fc-3)  SSE     \  ,      , 

\k{n-  l)  +  2      SSB'    )  '  V     ' 

[50,eq.3.22]  and  [24,eq.(7.7)].  This  form  is  equivalent  to  the  use  of  the  positive 
part  of  (1  —  sh)  which  has  been  shown  to  improve  upon  the  original  James-Stein 
shrinkage,  [26].  It  will  occur  to  some  that  a  much  simpler  procedure  is  available 
by  merely  performing  the  ANOVA  test  for  H0  '  Ml  =  A*2  =  •  •  •  =  A**-  If  we  accept 
H0,  then  use  (5,  =  X{  for  all  i  and  otherwise  use  $ ;  =  X  .  This  "testimator" 
procedure  is  also  inadmissible,  [51]. 

Multiparameter  estimation  methods  that  shrink  the  individual  group  esti- 
mators toward  some  common  central  value  have  appeared  rather  extensively 
under  the  names  of  Bayes  or  empirical  Bayes  procedures.  Such  procedures  uti- 
lize some  model  enhancements  for  the  data  gathering  process  and  these  need 
to  be  reviewed  in  the  light  of  each  particular  application.  Since  our  applica- 
tions involve  binomial  and  multinomial  probabilities,  the  reader  is  referred  to  [6; 
Chp.12]  for  methods  and  applications.  For  our  application,  a  brief  pilot  study 
was  made  using  these  methods  for  the  multinomial  probabilities  of  the  various 
attrition  types.  The  results  did  not  appear  promising  and  we  returned  to  our 
original  course. 

From  a  theoretical  point  of  view  we  are  engaged  in  an  interesting  conundrum. 
Having  adopted  the  model  of  a  large  number  of  independent  binomial  cells,  we 
know  that  there  can  be  no  Stein  effect  because  the  maximum  likelihood  estimator 
is  admissible.  This  is  true  both  for  squared  error  loss  [35]  and  the  "chi  square 
statistic"  loss  function,  [48;  p284].  Thus  the  justification  of  using  shrinkage 
appears  to  come  from  the  empirical  Bayes  arena.  Yet  our  first  attempt  to  use 
empirical  Bayes  directly  was  not  at  all  encouraging. 
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The  following  is  our  interpretation  of  the  riddle.  The  developed  empirical 
Bayes  procedures  use  beta  function  (or  Dirichlet)  prior  distributions.  These  are 
the  conjugate  priors  that  facilitate  the  calculations.  It  is  known  that  this  sub- 
family does  not  perform  well  compared  to  maximum  likelihood  estimates  when 
the  cell  probabilities  are  extreme  (close  to  zero  or  one).  Thus  our  lack  of  success 
is  probably  due  to  the  fact  that  the  attrition  rates  are  small:  overall  longterm 
average  of  about  10%  per  year.  Thus  our  encouraging  results  are  credited  to 
the  idea  that  the  basic  strategy  (transform,  shrink,  invert  the  transform)  cor- 
responds to  an  empirical  Bayes  procedure  in  an  implicite  way,  see  [21]  for  a 
general  discussion.  Others  have  had  success  doing  this  with  a  binomial  setting, 
[13,27].  The  fact  that  the  binomial  distribution  is  well  approximated  by  a  nor- 
mal distribution,  surely  plays  a  role.  Also,  one  should  consider  the  thoughts  of 
Berger,[5]. 

Lastly,  we  must  keep  in  mind  the  weaknesses  of  our  model.  Perhaps  the 
most  important  point  here  is  the  unlikelihood  of  year  to  year  stationarity.  The 
ultimate  validation  must  somehow  model  and  make  reasonable  allowances  for 
these  temporal  changes.  The  mixing  of  "snapshot"  and  central  data  is  also  a 
problem,  but  we  believe  this  is  largely  one  of  noise  rather  than  one  of  structure. 
The  independent  cell  binomial  model,  although  thought  to  be  robust,  could  be 
improved  upon  using  general  flow  models.  These  latter  models  would  be  much 
more  cumbersome  to  use  on  such  a  very  large  scale. 

D.  Loss  Functions.  Measures  of  Effectiveness.  Validation 

Several  loss  functions  or  measures  of  effectiveness  (MOE's)  have  been  applied 
in  this  project.  Each  serves  its  own  purposes.  A  disquieting  aspect  of  the 
research  to  date  has  been  the  fact  that  an  estimation  technique  that  works  well 
with  one  MOE  may  make  a  poor  showing  using  another  one. 

Initially  we  applied  the  James-Stein  estimator.  This  estimator  was  designed 
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to  perform  well  for  normally  distributed  data  using  the  squared  error  loss  func- 
tion 

MM  =  rI>-*)2  (15) 

«=i 

where  8  =  (Si, ...  ,6k)  are  the  estimating  statistics  and  fj.  =  (/zi , . .  .  ,  Hk)  are  the 
means  to  be  estimated.  (For  validation  purposes  ai,  is  replaced  by  the  trans- 
formed data  for  the  i**  cell  during  the  validation  year).  Thus,  this  MOE  was 
used  to  compare  estimating  schemes  in  the  transformed  scale,  that  is,  after 
transforming  and  shrinking,  but  before  inverting  the  shrunken  estimates  back 
to  the  original  scale.  These  MOE  values  are  identified  by  the  words  "transformed 
scale"  (TS)  squared  error  loss.  They  serve  the  purpose  of  measuring  how  well 
the  "shrinkers"  are  performing  compared  to  that  specified  by  the  supporting 
theory.  The  transform  is  scaled  to  produce  a  variance  of  unity,  so  we  are  looking 
for  values  of  L  near  one. 

Since  the  manpower  planner  cares  little  about  performance  on  the  trans- 
formed scale,  and  does  care  greatly  about  performance  on  the  original  scale, 
comparisons  were  also  made  using  chi  square  statistics: 

fr[  niPi(l  -pi) 

where  e,  =  estimated  number  of  attrition  in  the  t  cell;  a,  =  actual  number  of 
attritions  in  the  i'*  cell  for  the  validation  year;  n,  =  inventory  for  the  i**  cell 
in  the  validation  year;  p,  =  ej/n,\  If  the  model  is  correct  and  the  estimators 
are  doing  their  job,  this  measure  has  a  chi  square  distribution  with  A:  degrees  of 
freedom.  This  fact  means  that  its  expected  value  is  k,  its  variance  is  2k,  and  an 
absolute  standard  is  available.  There  is  a  deceptive  point,  however.  In  a  number 
of  instances  there  are  cells  with  non  zero  values  for  a,  and  yet  the  maximum 
likelihood  estimator,  p,,  is  either  0  or  1.  In  such  cases  the  denominator  of  (16) 
is  zero  and  the  MOE  cannot  be  computed.  Rather  than  allow  the  information 
from  the  entire  aggregate  to  be  lost,  we  adopted  the  expedient  of  truncating  the 
number  of  cells;  k  is  reduced  to  k'  (the  number  of  useable  cells)  and  the  MOE 
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is  computed  and  printed.  This  expedient  has  the  effect  of  giving  an  unnatural- 
advantage  to  the  maximum  likelihood  estimators.  The  reader  must  interpret  the 
results  in  the  light  of  this  point.  No  such  truncation  is  applied  for  the  competing 
shrinkage  estimators,  so  comparisons  become  more  difficult. 

Discussion  with  NPRDC  personnel  over  the  MOE  questions  raised  the  issue 
that  the  chi  square  MOE  is  really  a  weighted  squared  error  loss  MOE  that  was 
chosen  for  its  statistical  properties.  A  measure  is  needed  that  is  of  more  direct 
service  to  the  manpower  analyst.  These  thoughts  have  led  to  the  recognition 
that  (i)  an  average  magnitude  of  errors  is  more  useful,  and  (ii)  the  cost  of  over- 
estimating is  not  the  same  as  the  cost  of  underestimating  even  if  the  magnitudes 
are  the  same.  Since  actual  costs  are  not  available  and  are  likely  to  change  among 
the  aggregates,  we  adopted  a  general  purpose  method  that  allows  the  user  to 
consider  the  magnitudes  of  underage  and  overage  separately: 

1    *  1   k 

MO  =  -^(e«  -  Q.)+      MU  =  lH(a«  ~  e»)+      MAD  =  MO  +  MU      (17) 
k   i  k   l 

when  MO  stands  for  mean  overage;  MU  for  mean  underage;  M AD  for  mean 
absolute  deviation;  and  the  plus  superscript  denotes  the  positive  part. 

Unlike  the  previous  two  MOE's,  we  have  no  theoretical  way  to  judge  the 
adequacy  of  estimation  schemes  using  (17).  Thus  one  should  prepare  to  compute 
some  empirical  savings  figures.  Letting  e,(c)  denote  the  attrition  estimates  for 
the  ith  cell  using  current  methods;  e,(*)  for  proposed  methods;  and  using  these 
values  to  produce  MO(c),MO(*),  MU(c),MU(*)  one  can  then  compute  some 
relative  savings  figures 

MO{*)/MO{c)    and    MU{*)/MU{c)  (18) 

in  order  to  make  judgments  about  proposed  procedures. 

In  summary  then,  we  are  looking  for  transformed  scale  loss  figures  of  about 
one,  original  scale  chi  square  figures  of  about  k,  and  the  best  looking  set  of 
ratios  for  savings  in  underage  and  overage  without  having  any  absolute  figure 
as  a  goal. 
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The  details  of  validation  require  that  designations  be  made  for  which  data 
are  used  for  developing  the  attrition  estimates,  e,  and  which  are  reserved  for  val- 
idation to  provide  the  actuals,  a.  In  the  earlier  works,  it  was  arbitrarily  decided 
to  use  the  first  four  years  (77-80,  first  data  tape)  for  estimation  and  the  last 
three  (81-83)  for  validation  [22,50,57].  The  results  indicate  that  the  validation 
for  82  and  83  (two  or  more  years  into  the  future)  are  uniformly  poor.  It  was 
concluded  that  there  must  be  a  time  series  effect  and  that  questions  of  forecast- 
ing must  ultimately  be  faced.  More  importantly,  it  was  decided  for  immediate 
work  to  base  all  comparisons  and  conclusions  upon  the  1981  validation  figures, 
(one  year  into  the  future). 

A  complete  cross  validation  [28,55]  is  being  planned  for  the  empirical  Bayes 
estimator,  [Section  4.2].  The  more  refined  (ten  years)  data  tape  will  be  used 
and  each  estimation  calculation  will  use  nine  years  of  data.  That  is,  each  of  the 
ten  years  will  be  taken  out  successively,  case  by  case,  for  validation  use  while 
the  remaining  nine  are  used  to  develop  the  estimators.  This  is  what  we  mean 
by  a  complete  cross  validation. 

3      THESIS  SUMMARIES 

Seven  Master's  theses  have  been  produced  by  this  project.  Each  has  made 
important  contributions  to  the  understanding  of  the  problem.  A  brief  suummary 
of  each  will  be  given  in  this  section,  but  the  emphasis  will  be  largely  in  terms  of  its 
bearing  upon  our  two  main  problems:  aggregation  and  estimation.  On  occasion, 
some  of  the  important  peripheral  and  supporting  results  will  be  mentioned,  but 
lightly. 

1.  Tucker,  D.D.  [57].  This  thesis  is  the  initial  one  in  the  series.  Major 
Tucker  spent  his  experience  tour  at  Headquarters  USMC,  used  this  opportu- 
nity for  familiarization  purposes,  and  did  a  superb  job  of  obtaining  background 
information  and  laying  a  proper  base  for  others  to  work  on  the  problem.  His  his- 
torical remarks,  comments  on  the  officer  planning  system,  promotion  prospects 
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by  rank  and  coding  of  the  structural  zeroes  (cells  having  ze-o  inventory  because 
of  system  structure)  by  MOS  catergory  are  most  useful.  Further,  he  compiled  a 
number  of  profile  and  other  macro  statistics  that  allow  the  researcher  to  envision 
how  the  system  works.  This  thesis  also  contains  the  formatting  information  for 
the  first  data  tape. 

Major  Tucker  tested  three  estimation  schemes;  maximum  likelihood  (3), 
James  Stein  (12)  and  (14),  and  minimax.  We  display  the  minimax  estimator 
here, 

(m)  1  [   Yj(-)  11 

p«     ■  l  +  v/A^Hv/^H^J  (19) 

where  Y,(-)  and  #,(•)  refer  to  Yi(t)  and  N^t)  summed  over  the  estimation  years. 
Explicit  values  of  the  average  loss  (11)  appears  in  Table  XVII, [57,  p66]. 

This  thesis  used  an  ad  hoc  aggregation  scheme  which  specified  eight  sets  of 
officers;  first  lieutenants  for  each  of  four  MOS  groups  and  lieutenant  colonels  for 
each  of  the  same  four  MOS  groups.  The  MOS  groups  are: 

1.  Aviators  (one  OF  code); 

2.  Ground  Combat  (three  OF  codes); 

3.  Combat  Support  (three  OF  codes);  and 

4.  Combat  Service  Support  (all  others  OF  codes). 

[57;  pl5].  Also  all  LOS  cells  were  included  which  were  not  structural  zeros  when 
cross  classified  with  GR  and  MOS. 

The  result  of  this  study  gave  very  substantial  support  to  the  James-Stein 
estimator.  The  minimax  estimator  was  deemed  to  be  too  conservative  for  small 
cell  use  and  was  discarded. 

2.  Robinson,  J.R.  [50].  Based  upon  the  work  of  Tucker,  the  immediate 
follow  on  effort  was  directed  toward  giving  more  attention  to  the  small  cells  and 
a  less  hurried  look  at  the  basic  James-Stein  and  maximum  likelihood  estima- 
tors. This  was  undertaken  by  Major  J.  R.  Robinson,  who  also  introduced  the 
limited  translations  shrinkage  alternative,  see  [24];  performed  a  more  thorough 

17 


validation  using  both  transformed  scale,  eq(15)  and  original  scale  eq.(l6);  and 
uncovered  the  fact  that  some  of  the  arbitrary  choices  made  earlier  can  have 
rather  deep  effects.  Robinson  also  introduced  the  TSCA,  transformed  scale  cell 
average,  estimator. 

This  thesis  used  the  same  ad  hoc  aggregation  scheme  that  was  introduced 
by  Tucker,  except  that  the  catchall  aggregate,  Combat  Service  Support,  was 
dropped.  The  new  TSCA  estimator  is  computed  by  applying  the  Freeman-Tukey 
transformation  (4)  using  as  input  the  individual  Nt(t)  and  Yt(t),  eq(l)  and  (2). 
The  resulting  Xt(t)  is  then  averaged  over  time.  To  invert  to  the  original  scale, 
one  uses  this  value,  call  it  X* ,  together  with  n*  =  time  average  of  inventory  over 
the  estimation  years,  and  applies  (8).  Notice  how  this  differs  from  the  MLE, 
which  averages  over  time  prior  to  applying  (4).  Note  further  that  TSCA  may 
be  viewed  as  James-Stein  with  zero  shrinkage. 

The  limited  translation  James-Stein  (LTJS)  is  complicated  and  the  reader 
is  referred  elsewhere,  [24]  and  [50;  App.C],  for  details.  We  will  however  draw 
attention  to  some  of  its  features.  The  basic  idea  is  to  reduce  the  amount  of 
shrinkage  in  the  tails  of  the  distribution  of  the  transformed  values,  X{.  This  has 
the  effect  of  reducing  the  individual  errors  for  the  extreme  cells  at  the  cost  of 
(hopefully)  only  modest  increases  in  total  loss,  eq.(ll).  To  achieve  this  one  is 
faced  with  the  selection  of  a  tuning  constant,  d,  representing  the  number  of  stan- 
dard deviations  into  the  tails  that  one  allows  for  full  shrinkage  before  switching 
to  reduced  shrinkage.  Robinson  showed  that  this  parameter,  d,  changed  with 
the  aggregated  set.  This  author  also  studied  some  very  small  cells,  i.e.  inventory 
ranges  (0,5)  and  (6,10). 

The  results  of  this  thesis  were  sobering.  First  of  all,  the  TSCA,  MLE,  JS, 
and  LTJS  estimators  were  all  competitive.  This  was  especially  striking  because 
in  Tucker's  work  it  appeared  that  JS  was  superior  to  MLE.  Investigation  into 
this  matter  showed  that  the  method  of  counting  cells  in  an  aggregate  can  have 
a  sharp  effect.  E.g.  Tucker  used  the  number  of  non  structural  zero  cells  whereas 
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Robinson  used  the  number  of  non  empty  cells.  Thus  for  example,  the  former 
used  k  -  57  and  48  for,  respectively,  ground  combat  first  lieutenant  and  combat 
support  lieutenant  colonel;  Robinson's  values  were  k  =  45  and  40  for  these  two 
groups.  He  also  excluded  the  sampling  zeros:  cells  of  zero  inventory  because 
of  sampling  and  not  because  of  organizational  structure.  This  change  had  the 
effect  of  returning  MLE  to  competitiveness. 

Earlier  it  was  pointed  out  that  MLE  estimators  allow  values  of  zero  and  one, 
both  of  which  make  (16)  uncomputable.  When  such  values  are  removed  in  order 
to  use  the  chi  square  measure,  the  values  of  k,  for  the  above  listed  cells,  becomes 
35  and  23.  These  facts  dramatize  our  problem  of  cell  definition  and  aggregation. 

It  was  also  discovered  that  Tucker's  version  of  eq.(14),  (57;  p55,  Step  3]  is 
in  error.  In  addition,  Major  Robinson's  extensive  study  of  the  very  small  cells 
illustrated  unstable  behavior.  That  is,  performance  is  at  variance  with  that 
prescribed  by  theory.  It  may  be  better  for  the  very  small  cells  to  be  pooled 
together  into  single,  larger  cells  rather  than  be  exposed  to  this  instability. 

3.  Amin  Elseramegy,  H.[l]  This  thesis  reports  our  first  attempt  to  treat 
the  aggregation  problem.  The  Naval  Postgraduate  School  had  recently  acquired 
the  very  modern  and  glamourous  CART  (Classification  and  Regression  Trees) 
program.  Our  plan  was  to  try  using  this  program  to  form  aggregates  of  cells 
that  exhibited  homogeneity  of  behavior  with  regards  to  attrition,  [1,9]. 

We  ran  into  a  number  of  difficulties,  and  the  effort  of  learning  to  use  the 
program  became  a  major  task.  Our  data  base  is  much  larger  than  that  which 
the  CART  system  provides  for,  as  installed  on  our  IBM  3033  system.  It  was 
necessary  to  partition  it  arbitiraily  into  nine  sets  so  that  each  could  be  run 
separately.  Moverover,  to  conserve  computer  memory,  the  LOS  scale  was  treated 
as  a  quantitative  interval  scale  and  not  as  a  set  of  categorical  variables.  Again 
the  first  four  years  were  used  for  estimation,  i.e.,  learning  samples  in  CART 
parlance,  and  the  raw  attrition  rate  was  used  as  the  response  variable. 

Perhaps  the  point  of  greater  import  was  that  CART  is  a  "top  down"  system. 
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It  starts  with  all  of  the  data  (that  memory  can  hold)  in  a  single  aggregate  and 
goes  rough  a  succession  of  binary  splits,  each  split  making  the  most  dramatic 
division  possible  on  the  scale  of  the  response  variable.  A  stopping  rule  terminates 
the  process  and  the  result  is  a  binary  tree.  A  new  case  can  be  dropped  through 
the  tree,  follow  the  path  prescribed  by  the  succession  of  splits,  and  come  to  rest 
in  a  terminal  node  of  the  tree.  That  node  will  specify  the  attrition  rate.  This 
top  down  approach  provided  us  with  useful  break  points  in  the  LOS  (interval) 
scale  in  the  earlier  splits.  The  later  splits  were  a  mix  and  match  set  of  GR  and 
MOS  combinations  that  had  no  apparent  structure.  Our  applications  require 
structure  for  customer  oriented  organizational  purposes. 

Thus  the  experience  was  useful  in  that  it  drew  attention  to  the  need  for  a 
"bottom  up"  approach  to  aggregation.  Some  organizationally  meaningful  cells 
should  be  brought  together  first.  Then  we  must  pool  to  get  reasonably  sized 
inventory  numbers  before  computing  response  variables.  We  also  learned  that 
our  ad  hoc  practice  of  using  all  (non  structural-zero)  LOS  cells  in  an  aggregate 
is  a  poor  one. 

4.  Hogan,  D.L.[34]  Attention  had  been  drawn  to  the  fact  that  the  vali- 
dation figures  for  time  lags  of  two  and  three  years  were  poor  and  not  used  in 
the  comparison  of  estimation  schemes.  That  is,  the  values  produced  by  the 
data  (equally  weighted)  of  four  estimation  years  produced  tenable  values  for  the 
first  year's  validation,  but  not  for  the  other  two  years.  This  lead  us  to  believe 
that  there  is  a  time  series  effect  and  Lieutenant  Hogan  explored  the  exponential 
smoothing  technique,  [11,34]  in  order  to  treat  it. 

In  the  large,  this  technique  provides  a  way  to  update  estimates  yearly  with 
the  passage  of  time.  It  weights  the  recent  past  more  heavily  and  discounts  the 
distant  past  exponentially  using  a  smoothing  constant,  a.  Also,  there  is  an 
interesting  side  advantage  in  that  storage  requirements  are  minimal. 

Lieutenant  Hogan  worked  with  the  four  competitive  estimators  identified  by 
Robinson,  and  the  same  six  aggregates.  The  smoothing  constant  a  was  chosen 

20 


to  minimize  the  MOE  's  (or  FOMs,  figures  of  merit). 

The  results  indicated  that  exponential  smoothing  does  indeed  give  relief  to 
the  problem  of  estimating  rates  using  larger  time  lags.  The  constant  a,  for 
the  various  aggregates,  are  larger  than  those  generally  encountered  in  other 
applications  of  exponential  smoothing,  and  they  are  not  as  stable  as  we  would 
like.  In  particular,  the  aviation  community  has  emerged  as  being  quite  singular. 

5.  Yacin,  N.[58]  In  response  to  intradepartmental  pressures,  it  was  decided 
to  explore  the  logistic  regression  alternative  using  as  carriers  LOS  (an  interval 
scale)  and  GR  (an  ordered  scale).  Indeed,  if  successful  the  regression  approach 
is  preferred,  [31,49,58]. 

Generally,  but  not  always,  shrinkage  estimation  methods  (treating  these  vari- 
ables as  levels  of  two  factors)  perform  better.  The  logistic  regression  made  its 
best  showing  for  3  <  LOS  <  9  and  4  <  GR  <  6.  Perhaps  the  mose  useful 
aspects  of  this  study  are  the  qualitative  results: 

(i)  For  0  <  LOS  <  3:  attrition  rates  are  chaotic  as  young  officers 
"test  the  waters" . 

(ii)  For  3  <  LOS  <  9:  attrition  rates  decline  with  increasing  LOS  as 
officers  commit  themselves  to  longer  second  and  third  contracts. 
One  would  think  that  advancement  in  grade  would  also  correlate 
with  a  lower  rate,  but  we  don't  see  that.  There  appears  to  be 
other  kinds  of  shifts  influencing  the  attrition  behavior  in  these 
years. 

(iii)  For  9  <  LOS  <  19:  the  maturing  career  commitment  has  been 
made  and  rates  decline  with  increasing  LOS  and  GR. 

(iv)  For  19  <  LOS  <  30:  since  advancement  opportunities  of  the 
senior  officer  are  quite  limited  we  see  rates  increasing  with  LOS 
and  decreasing  with  advances  in  GR. 

6.  Larsen,  R.W.[38]  Substantial  progress  in  the  aggregation  problem  was 
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made  in  this  thesis.  This  is  the  first  work  that  utilized  the  second,  more  refined, 
data  tape.  It  contains  the  format  for  that  tape.  Captain  Larsen  presented  a  de- 
scription of  the  current,  dynamic  (user  specified  threshold)  aggregation  method 
and  followed  the  general  plan  specified  by  it.  He  applied  a  hierarchical  clustering 
algorithm  to  the  new  data,  see  [2,  37,  38],  and  exposed  the  relative  importance 
of  some  special  MOS  cells  and  YCS  intervals.  The  separation  of  the  aviation 
community  into  several  groups  is  most  revealing  and  undoubtedly  explains  much 
of  the  instability  encountered  earlier  when  the  estimation  schemes  were  applied 
to  that  group  aggregated  as  a  whole. 

Equally  important  are  the  break  points  in  the  YCS  scale  uncovered  by  this 
thesis.  Thus,  a  new  order  of  putting  cells  together  is  indicated;  a  different  set 
of  priorities  is  established. 

7.  Dickinson,  C.  R.[22]  This  thesis  also  used  the  newer  more  refined  data 
tape.  We  remind  the  reader  that  this  tape  recorded  inventory  in  man  quarters, 
whereas  the  previous  one  gave  counts  at  the  beginning  of  the  fiscal  year.  This 
distinction  appears  to  have  a  very  noticeable  effect.  Also,  LOS  is  replaced  by 
YCS.  Captain  Dickinson  repeated  the  Robinson  calculations  (MLE,  TSCA,  JS) 
for  the  same  groups  and  included  an  empirical  Bayes  estimator  as  well.  The 
results  show  that  all  are  competitive  in  the  comparitive  sense  and  the  MOE 
numbers  have  greater  stability  than  those  exhibited  using  the  other  tape.  Also, 
they  are  distinctly  different  from  the  earlier  values. 

In  addition,  Captain  Dickinson  performed  some  side  studies  treating  issues 
that  had  been  treated  "out  of  hand"  in  earlier  works.  Specifically: 

(i)  Approximate  and  use  the  unequal  variances  on  the  transformed 

scale, 
(ii)  Study  of  the  effect  of  alternative  inversion  formulae, 
(iii)  Choice  of  inventory  values  for  inversion  of  the  transform, 
(iv)  Graphical  description  of  non  uniform  shrinkage  and  nonlinear 

shrinkage  curves  on  the  original  scale. 
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To  elaborate,  item  (i)  is  necessary  in  order  to  develop  useful  empirical  Bayes 
estimators,  [13,15,25,26,43].  Otherwise,  the  shrinkage  is  uniform  for  all  groups 
and  that  situation  should  be  adequately  covered  using  the  basic  James-Stein 
estimator.  Item  (ii)  deals  with  the  question  of  returning  to  the  original  scale  from 
the  transformed  scale.  The  basic  inversion,  eq.(8)  has  been  used  in  all  earlier 
studies,  but  some  competitors  have  appeared  in  the  open  literature,  [13,27,42]. 
Certainly,  the  FTE  (Freeman-Tukey  exact,  Ref[42])  must  be  considered  seriously 
since  the  basic  inversion  eq.(8),  is  neccessarily  only  approximate.  The  problems 
encountered  in  this  area  are  connected  with  those  addressed  in  item  (iii).  The 
choice  of  inventory,  n,  to  be  used  in  the  inversion  varies  with  the  group  index. 
This  leads  to  the  awkward  condition  that  full  shrinkage  to  the  grand  mean  on 
the  transformed  scale  does  not  invert  to  a  common  attrition  rate  on  the  original 
scale. 

Turning  to  item  (iii),  the  FTE  was  discovered  by  Miller,  [42],  who  also  recom- 
mended the  use  of  the  harmonic  mean  (over  time)  when  choosing  an  inventory 
value  for  purposes  of  inversion.  Captain  Dickinson  studied  this  question  via 
computer  simulations  using  arithmetic,  geometric,  and  harmonic  means  and  the 
small  values  of  attrition  rates  that  are  of  interest  to  us.  The  arithmetic  mean 
made  the  best  showing,  probably  because  of  the  small  rates. 

The  graphical  shrinkage  paths,  item  (iv),  are  interesting,  but  not  alarming. 
The  individual  paths  are  smooth  and  appear  to  have  monotone  derivations;  the 
bow  is  not  severe;  straight  line  approximations  would  not  be  damaging. 

In  the  eleventh  hour  of  his  work,  Captain  Dickinson  experimented  with  a 
weighted  empirical  Bayes  estimator,  [22,  App.E].  The  result  is  very  positive  and 
this  estimator  is  recommended  for  further  study. 
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4      CURRENT  STATUS  AND  CONTINUATION 
PLANS 

1.  Technical  Insights 

It  has  been  known  for  a  long  time  that  the  method  of  least  squares  (for  our 
problem  this  means  the  use  of  raw  empirical  rates)  picks  up  too  much  of  the 
idiosyncracies  of  the  "training"  data  set  and  leads  to  disappointing  performance 
when  used  for  predictions.  There  are  a  number  of  ways  for  combating  this, 
many  of  them  having  an  ad  hoc  flavor,  and  generally  rather  intensive  in  terms 
of  computation.  The  class  of  James-Stein  and  other  shrinkage  methods  possess 
very  salable  analytic  support;  their  use  should  become  wide  spread. 

The  Elfron-Morris  paper  "Data  Analysis  Using  Stein's  Estimator  and  Its 
Generalizations" ,  [27]  presents  reasons  why  more  applications  have  not  been 
forth  coming.  They  also  present  three  applications  of  the  method  that  pro- 
vide very  dramatic  improvements  over  classical  methods,  and  serve  as  models 
for  use  by  practitioners.  Their  toxomosis  prevalence  rate  example  is  especially 
convincing.  The  data  are  completely  real  and  the  gains  are  of  the  order  of  200 
percent. 

Their  baseball  example  is  a  closer  prototype  to  our  application  and  the  gains 
are  given  as  350  percent.  This  certainly  appears  attractive.  The  fact  that 
the  authors  were  able  to  practice  some  selectivity  in  this  example  has  emerged 
as  a  point  of  importance  along  with  the  natural  distinctions  between  batting 
averages,  attrition  rates  and  other  aspects  of  our  problem.  We  take  a  moment 
to  discuss  the  insights  that  have  been  developed  regarding  these  things. 

In  the  batting  average  example  18  players  were  selected  and  the  results  of 
their  first  45  times  at  bat  were  used  for  the  estimation  or  training  data  set. 
The  shrunken  estimated  batting  averages  were  then  compared  with  the  end  of 
season  values,  and  with  great  success.  The  player  selection  scheme,  [27;  pg  312], 
was  driven  largely  by  the  goal  of  exactly  45  times  at  bat  on  certain  dates;  all 
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rates  exceded  .15.  Thus,  all  n,  =  n  and  this  value  is  sufficiently  large  so  that 
the  variance  of  the  transform  (essentially  Freeman-Tukey),  for  .15  <  p  <  .85,  is 
constant.  Moreover,  this  degree  of  selectivity  also  insures  that  there  is  no  issue 
as  to  how  to  invert  the  transform  after  shrinkage. 

For  the  attrition  rate  problem  the  vast  majority  of  rates  are  below  .15,  the 
inventory  values  are  seldom  as  high  as  45  and  certainly  not  constant.  We  have 
no  guide  as  to  how  large  k  (number  of  cells)  should  be  other  than  k  >  4.  Our 
experiences  had  led  us  to  believe  that  unevenness  in  cell  to  cell  (also  over  time) 
inventory  is  detracting  from  the  performance  of  our  estimators.  Some  isolated 
calculations  have  shown  that  the  method  of  inversion,  eq.(8),  is  an  important 
issue.  Thus  our  application  breaks  new  ground  and,  when  completed,  will  make 
an  important  contribution  to  the  lore. 

It  appears  that  Carter  and  Rolph  found  similar  issues.  They  state,  [13,  p382j 
paraphrased,  that  the  empirical  Bayes  estimators  will  perform  best  if  applied 
separately  to  groups  (aggregates)  of  cells  that  have  comparable  size  and  similar 
rates.  It  is  extremely  interesting  to  note  that  their  empirical  Bayes  estimators 
made  their  best  performance  (showed  the  greatest  savings)  for  cells  with  low 
rates.  This  is  also  the  experience  of  Fay  and  Herriott,  [29]. 

Some  of  the  other  details  of  this  Carter-Rolph  paper  are  not  clear.  The 
transform  inversion  formula  il3;  pg  882],  seems  to  have  a  misreferenced  origin. 
As  pointed  out  in  [22]  it  appears  to  perform  shrinkage  towards  p  —  0.5  on  the 
original  scale.  Since  this  detail  interacts  with  the  particular  empirical  Bayes 
method  used,  further  guidance  from  this  paper  is  not  attractive. 

Thus  we  believe  that  the  ad  hoc  aggregates  chosen  for  our  pilot  studies  are 
detracting  from  our  ability  to  discriminate  among  competing  estimators.  The 
next  major  effort  should  be  a  hands  on  study  of  the  data  following  the  lead 
of  Larsen  [38]  and  developing  sensible  aggregates  that  fit  well  with  the  natural 
organization  of  the  USMC  officer  corps. 
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2.   Current  Estimation  Recommendations 

Empirical  Bayes  estimators  require  knowledge  of  the  variance  for  each  cell. 
On  the  original  scale  this  is  given  by  the  familiar  formula  for  the  binomial 
distribution 

Var{Y)=p{l-p)/n  (20) 

which  can  change  sharply  with  both  p  and  n.  The  situation  is  more  pleasant  on 
the  transformed  scale.  Indeed  the  Freeman-Tukey  transformation  was  designed 
to  stablize  the  variance  at  one,  and  it  does  so  for  the  non  extreme  values  of 
p.  However  in  our  problem  there  are  important  combinations  of  n  and  p  for 
which  the  variance  is  smaller  than  one.  Moreover  we  are  fortunate  in  that  a 
single  interpolatory  curve  has  been  found  that  fits  this  variance  function  very 
well  for  broad  combinations  of  n  and  p.  Details  appear  in  [22;  App.  C];  skeleton 
summary  follows. 

Let  /i  =  E(X)  when  X  is  given  by  eq.(4).  Then,  to  a  very  good  approxima- 
tion for  N  >  3, 

Var{X)  =  max{l,V{n))  (21) 


where 


with 


V{n)  =  a{n  -  ir/2)bl(n  -  1  -  tt/2)6'  (22) 


a  =  1.6835        6i  =  -.8934         b2  =  .9881 

and  n  >  1.001  +  n/2.  (Clearly  the  formula  breaks  down  for  y.  —  1  —  n/2  negative.) 
The  value  one  that  appears  in  (21)  dominates  for  (about)  /i  -  n/2  >  2.2.  The 
formula  (22)  comes  into  play  for  iV  >  2,  and  p  >  .001,  with  the  upper  limit 
given  by  a  function  of  N,p;  see  [27]. 

Our  policy  for  empirical  Bayes  estimation  is  described  next.     Consider  a 
single  cell  and  let  T  be  the  number  of  years  in  the  estimation  set.  Then,  using 
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X(t)  from  (4),  with  the  argument  t  inserted  to  denote  the  year,  let 


XT(t)  =  X{t)/y/.5  +  N{t)     for  *  =  !,..., T 


(23) 


except  if  N(t)  =  0  then  XT(t)  does  not  exist  and  T  is  reduced  accordingly. 
Then  form  the  time  averages 


XTB  = 


Z*m 


I 


T 


(24) 


We  require  a  single  variance  figure  for  the  XT(t),  denote  it  VT,  and  define  it 


implicitly  by 


Var(XTB)  = 


Y^Var(XT(t)) 


/t2  = 


VT/T 


(25) 


and  (21)  is  used  in  the  summand  of  (25)  with  /i  replaced  by  XT(t).  Thus  XTB 
is  a  time  average  of  transformed  values  for  the  cell  and  VT  is  our  estimate  of 
its  population  variance. 

Now  the  empirical  Bayes  value  for  our  cell  is  the  convex  combination 


XEB  = 


■XTB+      VT._XBB 


(26) 


A  +  VT  A  +  VT' 

where  XEB,  XTB  and  VT  change  from  cell  to  cell  within  the  aggregate;  XBB 
is  a  single  central  value  (weighted  average)  and  A  is  the  variance  of  the  prior 
distribution  of  cell  means.  Both  A  and  XBB  must  be  estimated  jointly  using 
an  iterative  algorithm.  The  details  are  next. 

Let  k  be  the  number  of  cells,  as  before,  and  we  will  attach  subscripts  (t  = 
1, . . . ,  k)  to  previously  defined  quantities  that  depend  upon  the  cell.  We  will  use 
Aq  for  the  "previous"  value  of  A  in  our  iterative  algorithm  and  initialize  with 
A  =  0.  First  set 

A0  «-  A  (27) 

Next  define  temporary  values  {a,}  and  {7^}  by  means  of 

a,     =l/{A  +  VTi) 


7. 


.  =  a.75Za> 
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Then  compute  the  weighted  mean 


fc-1-  ^ai'XTBi  -  XBB2 

A ^=H:< 1 (28) 

]Tat2  XTBX  -  XBB]2 


Now,  if  the  result  is  A  <  0,  set  A  =  0  and  exit  from  the  loop.  Also  if 
A  -  Aq\  <  e(say  10-3),  we  are  finished  and  should  exit.  Otherwise,  return  to 
(27)  and  repeat  the  steps. 

Having  determined  XBB  and  A,  we  use  these  values  in  (26)  to  produce 
XEBx,i  =  l,...,k.  Notice  that  the  amount  of  shrinkage  changes  with  the  cell 
(i.e.  VTi  are  not  necessarily  all  equal  to  one  and  if  A  =  0  then  the  shrinkage  is 
100  percent  to  XBB). 

We  pause  to  note  that  the  previously  tested  non  uniform  shrinkage  method 
(LTJS,  [50])  selected  cells  with  extreme  time  average  values  for  reduced  shrink- 
age. The  empirical  Bayes  method  chooses  cells  with  lower  variance  for  dimin- 
ished shrinkage. 

3.  Forecasting 

Often,  the  applications  involve  forecasting.  There  are  great  differences  among 
the  users  as  to  the  length  of  the  forecast  period.  One  application  involves 
monthly  forecasts  while,  at  the  other  extreme,  another  involves  yearly  forecasts 
up  to  seven  years  into  the  future.  The  forecasting  method  should  be  tailored 
to  the  needs  of  the  application.  These  are  a  number  of  techniques  available, 
[7,8,10,34,41,52,56,]. 

Bres  and  Rowe  [10]  report  success  with  Naval  Officer  attrition  rate  forecast- 
ing using  a  third  order  auto  regressive  model  combined  with  a  linear  program 
that  solves  for  the  coefficients  using  MAD.  But  this  success  has  diminished  with 
time  (Rowe,  personal  communication)  and  other  techniques  are  being  developed, 
46,52].  Also,  NPRDC  is  working  with  some  econometric  models. 
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The  situation  is  different  for  very  near  term  forecasting.  Seventy  percent 
of  the  yearly  leavers  do  so  in  the  summertime  (Morton,  DSI,  personal  commu- 
nication). Often  the  users  are  experimenting  with  contingencies  and  sundry 
incentive  plans.  For  such  applications,  Bayesian  methods  could  prove  useful, 
[7,17,34]. 

We  have  paid  little  attention  to  forecasting  thus  far  in  our  project.  The 
two  and  three  year  validations  were  abandoned  because  of  their  instability.  The 
exponential  smoothing  applied  by  Hogan  showed  improvements  but  behaved 
erratically.  We  believe  that  a  quality  policy  for  managing  the  aggregation  prob- 
lem will  do  much  toward  laying  the  base  to  study  forecasting.  It  appears  that 
the  blend  of  shrinkage  estimation  and  multiparameter  forecasting  has  yet  to  be 
treated  in  the  open  literature.  This  presents  a  challenge. 
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